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Abstract 

We study the least-squares (LS) functional of the canonical polyadic (CP) tensor decomposition. Our 
approach is based on the elimination of one factor matrix which results in a reduced functional. The 
reduced functional is reformulated into a projection framework and into a Rayleigh quotient. An analysis 
of this functional leads to several conclusions: new sufficient conditions for the existence of minimizers 
of the LS functional, the existence of a critical point in the rank-one case, a heuristic explanation of 
"swamping" and computable bounds on the minimal value of the LS functional. The latter result 
leads to a simple algorithm - the Centroid Projection algorithm - to compute suboptimal solutions 
of tensor decompositions. These suboptimal solutions are applied to iterative CP algorithms as initial 
guesses, yielding a method called centroid projection for canonical polyadic (CPCP) decomposition 
which provides a significant speedup in our numerical experiments compared to the standard methods. 

Keywords: tensor decomposition, nonlinear least-squares method 



1 Introduction 

In 1927, Hitchcock HH] introduced the idea that a tensor is decomposable into a sum of a finite number 
of rank-one tensors. Today, this decomposition is referred to as the canonical polyadic (CP) tensor decom- 
position (also known as CANDECOMP [2] or PARAFAC |7J. CP tensor decomposition reduces a tensor 
to a linear combination of rank-one tensors, i.e. 

R 

(,'A)ijk ^ Q>irbjrC-kr (1*1) 

where A G R IxJxK , a r = {a ir ){ =l e R 1 , b r = (V)/=i e rJ and c r = ( c fcr)£Li € R K . The column vectors 
a r , b r and c r form the so-called factor matrices A € M /Xfl , B S ]R Jxi? and C € R K xR . The tensorial rank 
[TU] is the minimum R £ N such that T can be expressed as a sum of R rank-one tensors. 

The problem of interest is to find - if it exists - the best approximate tensor representable in a CP 
format with a tensorial rank R from a given (possibly noisy) tensor T eR IxJxK ■ A standard approach for 
this task is to minimize the norm of the residual tensor in the least-square sense: 

3(A, B, C) = l ~Y, [Oliih - J2 UrbjrCkr J ■ (1.2) 
i,j,k \ r=l / 

A popular iterative method for approximating the given tensor T via its factors (A, B, C) is called 
the Alternating Least-Squares (ALS) technique. Independently, ALS was introduced by Carol and Chang 
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P] and Harshman :7! in 1970. The ALS method is an application of the nonlinear block Gauss-Seidel 
algorithm |15j where the nonlinear optimization (1.2 1 is reduced into several least-squares subproblems 
which are solved iteratively with subsequent updates of the factor minimizer. The ALS algorithm has been 
extensively applied to many problems across various engineering and science disciplines; e.g., see the survey 
paper [TT] and the references therein. Despite the widespread popularity of ALS, it has its shortcomings. 
Problems can arise in degenerate problems and slow converging nondegenerate problems with CP solutions. 
To this end, methods like regularization techniques [T7] and enhanced line search [3T] are improvements of 
ALS. There are also several methods based on other techniques, such as, conjugate gradient [18] and Schur 
decomposition [23] for CP decomposition; see the paper of Comon et al. [3] on the survey of ALS methods. 

In this paper, we analyze the minimization of the objective function (1.2) by eliminating one factor A, 
reducing to a minimization over the factor minimizers B and C, equivalent to the original one. Analysis of 
the reduced functional allows reformulations into several forms: as a Rayleigh quotient type functional or 
as an weighted projection onto the Khatri-Rao range of B and C. As a consequence, we prove sufficient 
conditions for the existence of the minimizer of (1.2) in terms of the rank of the Khatri-Rao matrix which 
substantiates well-known facts about the degeneracy case, like the diverging norms of the factors and that 
the solution space is not closed [THEEE]. Furthermore, for the special case of rank-1 decomposition, we 
show - using Morse theory - the existence of a critical point which can lead to a halt of the ALS algorithms 
at nonextremal points. Poor convergence (swamping) of the ALS algorithm can be attributed to the feasible 
set, the Khatri-Rao range of B and C, of the reduced objective functional. 

Further analysis of the reduced objective functional provides upper and lower bounds. The minimizers 
of the upper bound turn out to be computable by linear algebra methods, yielding an effective and simple 
algorithm (the Centroid Projection (CP) Algorithm) for computing suboptimal solutions to (1.2). The 
suboptimal solutions may serve as a initial guesses to any iterative CP-decomposition methods like ALS or 
other advanced algorithms. We will refer to this powerful combination as the CPCP method. In our numer- 
ical examples, the Centroid Projection have shown to improve performance of several iterative methods for 
CP decomposition in comparison to the examples with random initial starters. Moreover, initialization of 
the upper bound minimizers works well for CP decomposition with symmetries [22| . that is, when at least 
two of the factors are identical. 



2 Preliminaries 

We denote the scalars in K with lower-case letters (a, b, . . .) and the vectors with bold lower-case letters 
(a, b, . . .). The matrices are written as bold upper-case letters (A, B, . . .) and the symbol for tensors are 
calligraphic letters (A,B, ...). The subscripts represent the following scalars: (A) i j k — (A)jj = a.y, 
(a)j = Oi. The superscripts indicate the length of the vector or the size of the matrices. For example, h K is 
a vector with length K and ^ NxK is a N x K matrix. In addition, the lower-case superscripts on a matrix 
indicate the mode in which has been matricized. 

The order of a tensor refers to the cardinality of the index set. A matrix is a second-order tensor and a 
vector is a first-order tensor. The scalar product of T, 72 £ K /x JxK is defined as 

ijk 

The Frobenius norm of A <E R IxJxK is defined as 

IJK 

i=i j= k=i 

which is a direct extension of the Frobenius norm of a matrix. Furthermore we denote by • the usual matrix 
product. 

Definition 2.1 The Khatri-Rao product of A e R IxR and B e R JxR is defined as 

A B = [ai ® bi a 2 <g> b 2 ... a R <g> b R ] € R IJxR 



when A = [a 1 a 2 ... a R ] and B = [b x b 2 ... b R ] . 
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Here, a® b denotes the Kronecker product of two vectors aeR', b 6 M J yielding a vector of size I J with 
entries that are all possible products of the entries in a and b. 

Definition 2.2 (Tucker mode-n product) Given a tensor T £ ^fixhxh anc [ matrices A £ K /lXjl , 
B £ M. 12 x 2 and C £ M 73 x Ja , then the Tucker mode-n products are the following: 



T»i A := (T«i A)j 1 i 3i3 
T»2 B := (T»2 B)i 1J2 i 3 
T»3 C := (T»3 C) 4l i 2 j 3 



^2 U 1 i 2 i 3 a ii: j 1 , Vji,i 2 ,i3 (mode- 1 product) 
•1=1 

^2 

^iui2»3^2j2i Vj 2 ,«i,i3 (mode- 2 product) 
is 

^1*2^3^3^3! Vj'3, ii,i2 (mode-3 product). 



Moreover, the Tucker mode products can be combined as in this example: 

T » 2 ,3 (B, C) := (T » 2 ,3 (B, C)) nr := X! T lli2 i 3 b l2r c i3r 

12 = 1 «3 = 1 

where B e R /2Xii and C e R 1 ^. 

Definition 2.3 (outer product of vectors) For vectors a € l/, b £ M. J the outer product & oh is the 
I x J matrix with entries 

(aob)j,j = a l b. ] , 

similarly, the outer product of three vectors aeR ; , h £ M. J , c £ M. K is the I x J x K tensor 

(a o b o cji^fc = dibjCk, Vi,j, 

3 The least squares functional and its reduction 



Recall the least-squares objective functional in |jl.2[): 

1 



a(A,B,C) 



T - ^ a r o b r 



(3.1) 



where || ■ is the Frobcnius norm. The goal is to find minimizers A, B and C of 

inf 3(A,B,C). 

A.B,C 

Note that it is well-known that this infimum is not necessarily attained see, e.g., [6]. 
Lemma 3.1 Let B, C be fixed. The solution to the minimization problem 

A[B,C] := argmin AeK ixR3(A, B, C) 
exists. In fact, a minimizer is given by 

A[B,C]=T. 2 ,3 (B,C)-G+, 
where is the pseudo-inverse of G with elements 



and 



J K 

(T »2,3 (B, C))i r := 2_j TijkbjrCkr- 
3 = 1 fc=l 



(3.2) 
(3.3) 

(3.4) 
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Proof. With B,C being fixed, (3.2) is a usual finite dimensional linear least squares problem for which it 
is well-known that a solution exists. Differentiation of the functional ( 1.2 1 with respect to ai* r - leads to the 
optimality conditions 



_d 
Oa 



d ( R \ 
3(A, B,C) = 2_j 7~i*jk — ^^ai* r bj r c kr bj r *c kr * = 0. 

j,k \ r=l / 



tain the matrix equation 

A-G = T. 2 , 3 (B,C) 

We know that a (not necessarily unique) solution exists, which then is expressible in terms of the pseudo- 
inverse (3.3). □ 



From the definition (3.4), 



(G) rs = fy^ftjr&js j (^2 CkrCk ^j = ( b r C r ) T (b s C s ), 



it follows that 

G = (B C) T • (B C) e R RxR , (3.5) 

is a Gramian matrix for the vectors b r ® c r , r = 1, . . . R as well as the Hadamard product of B T • B and 
C T • C. Note that G depends on B and C but we omitted this dependence to avoid exuberant notation. It 
follows easily that G is symmetric, and thus G^ is. Moreover, the pseudo-inverse satisfies the Moore-Penrose 
equation Gt G Gt = Gt. 

Motivated by the ALS algorithm, which iteratively minimizes over the factors matrices, we state the 
main tool in this paper, the reduced functional. Minimization over A reduces the original functional so that 
we now define 

a r „i(B,C):=3(A[B,C],B,C) (3-6) 



where A[B,C] is a minimizer in (3.2). This definition does not depend which minimizer we take. In the 
following lemma, we show that the minimizers of Z can be found through the minimizers of Zred- 

Proposition 3.2 //{B„,C n } is a minimizing sequence for Zred, then {A[B„, C n ], B„, C n } is a minimizing 
sequence of Z and the equality. 



inf 3(A,B,C) - infX ed (B,C), 

A,B,C B,C 



holds. 



Proof. Given that {B„, C n } is a minimizing sequence of Zred- hm„_j. 00 3red(B„, C„) — > infe.c 3red(B, C). 
Since inf a,b,c 3(A, B, C) < 3(A[B„, C n ], B„, C n ) = 3red(B„, C n ), we obtain 

inf 3T(A,B,C) < inf 3red(B, C) 

A,B,C B.C 

by passing to the limit. On the other hand, J(A, B, C) > 3(A[B, C], B, C) = 3red(B,C) > inf B ,c Zred(B, C) 
for arbitrary A,B,C. It follows that inf a,b,c 3(A, B, C) > infB,c Zred (B, C). □ 

Corollary 3.3 If (B„,C«) are minimizers of Zred, then (A[B*, C*], B*, C*) are minimizers ofZ- 



Proof. From Lemma 



3.1 



the factor A[B,C] always exists. Then if a minimizer (B*,C*) of 3red(B,C) 
exists, then (A(B*, C*), B*, C*) also exists and it is a minimizer of Z- D 



5 



3.1 Analysis of the reduced objective function 

The introduction of Zred reduces the number of unknown factors by one. In this section, we explicitly 
calculate Zred- Define 



■Ma/3-fS '■= '^^TiapTi.yS € ^ JxKxJy 



K 



(3.7) 



i=l 



a fourth order tensor from a contracted product over one index of two identical third-order tensors. The 
matricization M <E R JKxJK f M £ r-^kxJxk j g defined by the following: 

(M) a p 7 s — > (M) y 



where i — [a+[fi— 1) J] and j = [7+(<5— 1) J]. From (3.7), we have the symmetry {M) a f3 1 s = (M)-ySa/3 which 
implies that the matrix M is symmetric; i.e. M.y = Mjj. It was shown in [T] that due to the isomorphic 
group structures between the sets of invertible tensors and matrices: a symmetric ((SA) UVWX = [M) WXU v) 
fourth order tensor M JxKxJxK has an eigendecomposition: 



M = V * S * V 3 



(3.8) 



where * is the contracted product of fourth order tensors defined as (A * B)j -jj = X^i^XjM^) kl ij given 
that the symmetric matrix M has an eigendecomposition such as M = V • S • V T where (M); m — > M uvwx , 
(V)lm -> (V) uinlra: and (S) Zm -> (5)„ ww:c with u, w = 1, . . . , J, v = and x = m ~y+ J . Note that V is 

an orthogonal matrix and S is a diagonal matrix. 

In accordance with the notation of Section [2] we can state some useful tensor- vector and tensor-matrix 
operations for a, c £ K J and b, d £ M. K : 

1. M •1,2,3,4 (a, b, C, d) := J2a,0,-f,8 MaPjSO-abpCjds £ K 

2- (-M "2,3,4 (b, c, d)) Q = J2p n ,s Mafsysbpcyds £ R 1 
3. (M -2,4 (b, d)) a , 7 = E^, T , 5 M a p lS b d s £ R IxI 
Observe that 

X -1,2,3,4 (a, b, c, d) = (T . 2 ,3 (a, b)) T (T .2,3 (c, d)) = (a <8> b) T M(c (8 d). (3.9) 



Proposition 3.4 TTie following is the reduced objective: 



^(B, C) = i M|n|| - X](G f ) S rM "1,2,3,4 (b r , C r , b s , C S )^J 



where M £ 1Z 



JxKxJxK 



defined in (3.1) and is the the pseudo-inverse of G in (3.4) 



Proof. Expanding (3.1) yields 



(3.10) 



3 re d(B,C)=3(A[B,C],B,C) = i f (T,T) -2 /r,^a r ob r oc r J + ^a r ob r oc r ,^a r ob r oc r 



Using the component-wise definition of A[B, C] in Lemma 



3.1 



R R 

air = $^(T«2, a (B,C)) ls (G t ) sr = ^ ^(7I ifc 6 is c fes )(G 

S— 1 S— 1 j'fc 



t ) 



G 



we obtain the following: 
1. 



(T, a r ob r o c r 



R R 

^ ^ ^ ^ TijkbjrC-krQir — ^ ^ 
r— 1 i,j',/e 



(3.11) 



^(G f ) s , r .A/f •1,2,3,4 ( b r, C r , b s , C s ) 



^a r ob r oc r ,^a r ob r oc r = ^ E("" , k :) ' ,Cfe '')(" tf fej-fC/cf ) = E E " ir " tf E kjr&j? X] 



r,r ijk 



= ^<a r ,a f )G rf (3.12) 

^ EE IE E (^■^ Js c fcs )(Gt) sr (7: il 6^c^)(Gt),J G,, 
From the Moore-Penrose properties, we find 

Q]a r ob r oc r ,^a r ob r oc r = ^(G t GG t ) Ss .M •1,2,3,4 (b s , c s , b§, c§) 



= ^2 G ts M *1,2,3A ( b s,c s ,b§,c§) 



Equations (3.12| and (3.131 imply that the reduced objective is given by (3.10). 



(3.13) 
□ 



We can further simplify the functional: 

Lemma 3.5 Let U, S, V be the matrices in the singular value decomposition of (B C), i.e., (B C) = 
U • £ • V T G R JKxR with U € RJKx-JK orthogonal, £ G R JK * R diagonal and V G R RxR orthogonal. 
Then, 

/ R 

3red(B, C) = - ||n|| - EK: Mu k) 
\ r=l 



where M is the matricization of M. in (3.1) and R = rank{Yl) — rank(B C) and Uk is the k-th column of 
U. 



Proof. Starting from (3.131, with the shortcut A = A[B, C] and symmetry of G^ we find 
R 

J2(G f ) sr M .1,2,3,4 (b r , c r , b s , c.) - ((B C) ■ A T , (B C) • A T ) = ((B C), (B C) ■ A T • A) 

r,s 

= ((B C), (B C) • Gt • T .2,3 (B, CfT . 2 , 3 (B, C) • G*) 

= ((B C) • Gt, (B C) • Gt • T .2,3 (B, C) T T . 2 , 3 (B, C)> 

= ((B0C)-G t ,(B0C)-G t -(B0C) T M(B0C)) (from 

= ((B C) • G f • (B C) T , (B C) • G f ■ (B C) T • M) 

= Tr(B C) ■ G f • (B C) T (B C) ■ G f ■ (B C) T ■ M 

= TrP B0 c-M, 
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where Tr denotes the matrix trace and 
P B0 c = (B C) • [(B C) T • (B C)]t • (B C) T • (B C) • [(B C) T • (B C)] t • (B C) T . (3.14) 
Since (B C) = U • S • V T and Gt = V ■ (S T • E)t ■ V T , it holds that 

R 

^(G^srM -1,2,3,4 (b r ,c r , b s , c s ) — TrU ■ Pj] ■ P|J • U T ■ M = Tr(U ■ P S ) T M ■ U P s 

r,s 

where the projector matrix Ps = S(S T S)^S T e ^JKxjk can ^ e ca | ciua ted as 

ad i * 
else 



„ , 1 if i = 7 and i < rank(X) 

( p e)«H 



Thus, finally 



^(G f ) sr X •1,2,3,4 {W,c r , b s , c s ) = ^(u r , Mu r 



□ 



The previous lemma allows us to rewrite the minimization problem for Zred into a Rayleigh quotient 
type problem. 

Theorem 3.6 The minimization problem for 3red(B,C) is equivalent to the following maximization prob- 
lem 

R 



sup y^(u r , Mu r 



(3.15) 



where Ui, . . . , Uj^ is an orthonormal basis of range(H C) with R — rank(H C). Equivalence holds 
in the following sense: if (B, C) are (approximate) minimizers of Zred(B, C), then any orthonormal basis 
of range(B C) is a(n) (approximate) maximizer of ( |3.15| ). Conversely, if u%, . . . are (approximate) 
maximizers of (3.15), then the associated (B,C) are (approximate) minimizers o/3rerf(B,C). 



Proof. From Lemma 3.5 it is clear that (approximate) minimizers of Zred are equivalent to (approximate) 
maximizers (3.15) over the left singular vectors of B C. The maximization in (3.15) can be equally well 



done over any orthonormal basis of the range of B C: let U be the R IKxR matrix with columns the left 
singular vectors corresponding to nonzero singular values. The column vectors are an orthonormal basis of 
B C. Similarly, for any other orthonormal basis of this range we can build a matrix W with columns 
the basis vectors, which is related to U by W = UQ, where Q is an R x R orthonormal matrix. By the 



invariance of the trace, the sum in (3.15) can be written as 



^(u r ,Mu r ),= Tr(U T MU) = Tr(Q T U T MUQ) = Tr(W T MW), 



which ends the proof. 



□ 



The new problem formulation ( 3.15[ ) clearly indicates why the least squares problem might not have 



a solution. Obviously, the functional X)r=i( Ur >-^'- Ur )' ^ s continuous with respect to Ui,...u^ and since 
these vectors are orthonormalized they are within a compact set. However, the additional restriction that 
{ui, . . . ugj spans the range of a Khatri-Rao product space does not necessarily induce a closed set. In 
fact, a problem arises when the rank of the Khatri-Rao product decreases for a minimizing sequence. 

Proposition 3.7 Let (B„,C„) be a minimizing sequence of Zred (and thus (A[B„, C„], B„, C„) a min- 
imizing sequence of Z)- Without loss of generality, we can assume that there exists matrices B and C 
with 

B and lim C„ = C. (3.16) 

n— >oo 



lim Bj 

n— >oo 
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If the following rank condition, 

liminf rank(B n C„) < rank(B C). (3-17) 

n— >oo 

holds, then (B, C) is a minimizer ofZred and (A[B, C], B, C) is a minimizer of Z- In particular, a solution 
to the least squares problem exists. 

Proof. We first notice that 3red(B„, C„) does not change when (B n , C„) is replaced by pp-p- , ||c" || s i nce 
the range of the Khatri-Rao product does not change. So if the original sequence is a minimizing sequence, 
then so is ( ■ ^ " n , ^ n n )„. By a compactness argument, these matrices have a converging subsequence, which 
must again be a minimizing sequence. Thus, there is no loss of generality in assuming that the minimizing 



sequence of matrices converges as in (3.16|. Let r n be the rank of (B„ C„) such that r* 
and r = rank(B C). By using a subsequence argument, we can assume without loss of generality that 
\\m n r n — > r*. Now let us consider the associated left singular vectors of (B„ C n ). The sequence of 
vectors (uj, . . . , u" K ) are normalized eigenvectors of (B„ C„)(B n C„) T and by compactness we can 
find another subsequence for which all eigenvalues of (B„ C„)(B„ C„) T converge: 

^„^oo w; i=l,...JK 

If up corresponds to an eigenvalue A™ = 0, for n sufficiently large, it is obvious that Wi is in the nullspace of 
(B C)(B C) T . On the other hand, if u" corresponds to an eigenvalue with liminf„ A™ > 0, then since 
the eigenvalues are continuous functions of the matrix we get for a subsequence that 

w,= lim uf= lim i(B„ C„)(B„ C n ) T uf = |(B C)(B C) T Wi , 

thus, Wi is also an eigenvalue of (B C)(B C) T . With r n —> we obtain 

up — > Wi i = 1, . . . r* . 

Let us denote by Wi the remaining eigenvectors spanning the range of (B C) and let N be the supremum 



in (3T5J). Then 

r n r* r 

N = lim V(uP,MuP) = V(wi,Mwi) < V(wi,Mw;) < N 



i=l 



which shows that equality holds in this formula and thus, (wi)^ =1 are maximizers of ( 3.15 ) and the associated 



matrices (B, C) are minimizers of Zred- O 



Converse to these propositions is the following result that if a minimizer does not exist, then the rank 
of the Khatri-Rao product must change in the limit for any minimizing sequence. More precisely, the rank 
of (B C) of the limit of a minimizing sequence must be lower than the limit of the rank of (B n C„). 
If this is the case, at least one singular value of (B„ C„) tends to 0. A consequence of this is that the 
pseudo-inverse becomes unbounded, and thus, the norm of A n [B n , C„] may become unbounded. This 
reflect the well-known fact of diverging summands in, see, e.g., [B], which is referred to as the degenerate 
CP case. 



3.2 Rank-1 approximation 



It is worthwhile to study the special case of a least squares approximation (3.181 with R = 1. In this case, 
it is well-known that a minimizer always exists. Moreover, the minimizers can be calculated by a Rayleigh 
quotient type maximization. From the previous calculations, we obtain the following: 

Corollary 3.8 Consider the least squares problem 

min-||T-aoboc|||. (3.18) 

a,b,c 2 
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Minimizers to this problem always exist and the vectors b, c can be found as the solution of either one of 
the following equivalent problems 

(b c) T M(b c) X .1,2,3.4 (b,c,b,c) 
max = max ' U , U9U no (3.19) 

b,c ||b|| 2 ||c|| 2 b,c ||b|| 2 ||c|| 2 

= max (b c) T M(b c) = max M »i 2 3 4 (b, c, b, c) (3.20) 
II fall =i II fa II = i 

l|c||=l ||c||=l 

Proof. In the case R = 1, the Khatri-Rao product b c reduces to b c. For any b, c, yields a 
(one-dimensional) ortlionormal basis of the range of b c. On the other hand, any normalized basis (which 
contains only one vector) can be written as a Kronecker product with normalized vectors ||b|| = 1, ||c|| = 1. 
Proposition |3.7| yields the equivalence of these problems. Without loss of generality we can take a minimizing 
sequence (b n , c n ) normalized to one. Since then neither b n nor c n can be zero vectors rank(b„ c„) = 1, 



and the rank of possible limit vectors is rank(b0c) = 1. Thus, (3.17) holds and a minimizer always exists. 
□ 



The maximizers in this corollary corresponds to the generalized singular values of M which was already 
proven in [3] by De Lathauwer, De Moor and Vanderwalle. But such characterization only holds in the case 
R = 1. Proposition |3 . 7| gives the generalization to R > 1. 

The optimality condition for the generalized Rayleigh quotient is well-known: 

Lemma 3.9 A necessary condition for a maximizers b,c with ||b|| = l,||c|| = 1 in (3.19) is that there 
exists a number A such that 



■M '2.3.4 (c, b, c) = Ab and A4 »i,3,4 (b, b, c) = Ac. 



(3.21) 



Another way of stating the optimality condition is that b is an eigenvector of the matrix M »2,4 (c, c) and 
c is an eigenvector (with the same eigenvalue) for the matrix M. «x 3 (b, b). Unfortunately, straightforward 
linear algebra techniques cannot be applied to this nonlinear problem of calculating the maximizers since 
the eigenvectors are coupled to each other. 



We now look in more detail to the critical points of the functionals (3.19). By a compactness argument, 
it is clear that the functional 



/(b,c) :S 7 " 



1 x S J ~ X 
b. c 



M •1,2,3,4 (b,c,b,c), 



(3.22) 



always has a maximum and a minimum where S n = {x € 



pn+l 



|x|| = 1} denotes the n-dimensional 



sphere. Moreover, for each maximum and minimum b, c, the corresponding antipodal points (b, — c), 
(— b, c), (— b, — c) are maxima and minima as well. Topologically, however, there must exist critical points 
of this functional which are neither maxima nor minima, except in degenerate cases. In the two-dimensional 
cases I = J = 2, a related result was shown by De Lathauwer et al. [14]. We have a general result in arbitrary 
dimensions. 



Proposition 3.10 Suppose that the maxima and minima of (3.22) are nondegenerate in the sense that the 



Hessian of f at these points is non-singular (and thus, the Hessian is either negative or positive definite). 
Then there exist at least 4 additional critical points b, c that are neither maxima or minima of ( 3.22 ) . If all 
critical points are nondegenerate, then the number of critical points with index 7, 7 = 0, ...(/— 1) + (J — 1) 
must satisfy the following conditions 



and 



Co > 4, C(j_i) + ( j_x) > 4 C~f is divisible by 4 



± Ro < C 7 - C 7l + . . . ± C V 7 = 0, . . . (/ - 1) + ( J - 1) 



where the i? 7 (the Betti- numbers) are the coefficients in the polynomial 



(1 



^)(1 



(I- 



- 1 ) 



l)+(J- 

E 

7=0 
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Proof. Suppose that besides the maximal and minimal points there are no additional critical point. Then, 
by the nondegeneracy condition, / is a Morse function Till. However, by the Morse inequalities this is 
impossible. In fact, we know that there exist at least 4 points of maxima and 4 points of minima. They 
correspond to critical point with index 7 — (7 — 1) + ( J — 1) and 7 = 0. Hence, denoting by C 7 number 
of critical points with index 7, we have C(t—\)+(j—\) > 4 and Co > 4. On the other hand, the Poincare 
polynomial [S] of S 1 - 1 x S J ' X is (1 + x 1 ' 1 )^ + x J - r ) = 1 + ar 7 " 1 + x 1 ' 1 + ^-IH^-I), by the Morse 
inequalities C/_i > 1 and Cj—i > 1, which imply the existence of critical points (neither being a maximum 
or minimum) of index I — 1 and J — 1. Since for a critical point the corresponding antipodal points will be 
critical as well, we have shown the existence of at least 4 critical points. 

If we assume a-priori that all critical points are nondegenerate, / will be a Morse function and the 
Morse inequalities as stated in the proposition must be satisfied. By the same antipodal-point argument, 
the number C 7 must always be divisible by 4. □ 



For the case I = J = 2 we obtain Co > 4, C2 > 4, C± — Co > 2, which imply that C± > 6 and C\ > 8 
due to divisibility by 4. Thus, even if we consider antipodal points as being equivalent, there must be at 
least two more critical points beside the extrema. In the case I = J = 3 the inequalities yield lower bounds 
C\ > 4, C2 > 4, C3 > 4. Up to antipodal points we have here at least three more critical points occurring 
in the case that all critical points are nondegenerate. 

The critical points of /(b, c) correspond to critical points of the original least squares functional: 



Lemma 3.11 Let b,c £ S' 1 x S J 1 be a critical point of (3.22), then with the setting (cf. [3.3)) a 
7~»2.3 (b,c) the vectors (a, b, c) satisfy the first order optimality conditions of (3.181. 



Proof. With the definition of M. and ( |3.21[ ), a critical point (b, c) satisfies 

Ab = 7i,2,3(b, c)7i,., 3 (c) = 7i,., 3 (a, c) 

i 

and 

Ac = 7i, 2 ,.(a,b) 
A in the optimality condition can be calculated to 



A = M ll2 ,3,i(b, c, b, c) = J2 71,2,30, c) 7i,a,s(b, c) = ||aj| 



Thus we obtain the optimality conditions for (3.18): 



= T»2,3(b,c) 
fib = 7i,., 3 (a,c) 
= 71,2,. (a, b), 



(3.23) 
(3.24) 
(3.25) 



with a 



and fi 



□ 



Since ALS works with the first order optimality condition, it will saturate at a critical point. Thus, we have 
the following negative result: 



Theorem 3.12 If the extrema of (3.22) are nondegenerate, then there always exists a set of vectors (a, b, c) 
which is neither a maximum nor a minimum of ( |3.18[ ) for which the ALS sequence for (3.18) remains 
constant at this point 

(a k +i, b k+ i, c k+ i) = (a k , b k , c k ) Vfc > 1. (3.26) 



Proof. Taking as starting point for the ALS iteration a critical point satisfying ( 3.23 )-( 3.25), with vectors 
ao, bo, Cq normalized to norm 1. The ALS iteration in the rank-1 case reads 



a k+ i 



7i,2,3(bk, c k ) , _ Ti,..3(a k+ i, c k ) _ Ti,2,.(a k+ i, b k+ i) 

D k+1 — nouZ 119 c k+l 



|b k || 2 ||c k || 2 



!|a k+ i|| 2 ||c k || 2 



|a k+ i|| 2 ||b 



k+l 
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It follows by induction that with the given starting value, the iteration becomes 

&k+i = a k+1 a b k+ i = /}fc+ib c k+ i = 7/c+iCo, 
where otu+i, Pk+i, 7fc+l are numbers satisfying the recursion 

"fc+i = -5 Pfc+i = 7/c+i ~~ 



Pfc7fc «fe+i7fe afe+i/3/c+i 

for fc > 1. Eliminating first a^+i yields /3/c+i = fiki and furthermore 7fe+i = 7fc for all fc > 1, hence 
a k+2 = a fe+i- Thus, we observe that the iteration remains constant (3.26). Since the extrema of Z are 
one-to-one related to extrema of Zred and hence of /(b,c) the ALS sequence remains at a point which is 
not an extrema of the least squares functional. □ 

This result shows that there is no guarantee that a converging ALS sequence yields a minimizer of 3- Of 
course, this is not surprise for a first order method. 

3.3 Reduced functional in projection form 

We now derive an alternative form of the reduced functional Zred as a weighted distance to the Khatri-Rao 
space. This form will be useful in the next section to design a simple algorithm for finding an initial guess 
to the minimization form. 

Based on Lemma |3.5| we can simplify the reduced functional taking into account the diagonalization of 

M: 

Lemma 3.13 Let (B©C) = USV T G ^ JKxR and M = VSV T with orthogonal matrices: U G R JKxJK , 
V G R RxR , V G R JKxJK and diagonal matrices: £ G R JKxR and S G R JKxJK with S = diag(Xi). Denote 
by Vj the columns of V, then, 

JK f R \ JK / R \ 

Z red (B, C) = - J> || Vi || 2 - 5>i, u r ) 2 = - £ A* 1 - ]T(vi, u r ) 2 (3.27) 

i=l V r=l J z=l V r=l J 

Proof. Observe that ||T||| = EjkiEiTijkTijk) = E jk Mjkjk = trace(M) = Tr(SV T V) = £f K Ai||v,||| = 
£V Aj since |jv ; |j| = 1. With (u r ,Mu r ) = (u r , X)/^ AjVivf u r ) = X)j Aj(vi, u r ) 2 , the result follows. □ 

New we define the Khatri-Rao range, i.e. the range of the matrix B C. This range is a subset of 
R IJ ; for later use it is convenient to define the Khatri-Rao range by matricizing this range. As usual we 
denote the columns of the matrices B and C by bi and ci: 

KR(B, C) := jx = ^ ° c i e RjXK I where m G R, j (3.28) 

It is obvious that X G R /Xj is in the Khatri-Rao range X G KR(B, C) if and only if its vectorized version 
X vec e r ik ig in the range f b C 

X vec = (B C) fi. 



Theorem 3.14 Let Vj G R 7x ^ be the matricized version of a vector Vi G R JK appearing in Lemma 3.13 
With the notation of Lemma \3.13\ the reduced least squares functional is simplified as 



JK 

5(B, C) = - (||Vj - KR(B, C)|| F ) 2 , (3.29) 



i=i 



where ||V, — KR(B, C)|| denotes the distance of Vj to the linear subspace KR(B, C) 

||Vj-KR(B,C)|| F = inf ||V 4 -X|| F . 

X£KR(B,C) 
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Proof. Let Pu be the orthogonal projector onto range(B C) since the vectors Uj are an orthogonal basis 
of this range we have PuVi = Y2r=ii^U u r )u r . The minimum distance between Vi and range(B C) can 
be expressed by the projector as : 

inf ||vi - x||| = ||vi - PuVi|||. 

xerange(Boc) 

Moreover, ||vi|||, — ||PuVi||^ = ||vi|||> — (vi, u r ) 2 = ||v, — PrjVi||fi. Since inner products and norms are 



the same for a vector and its matricization, we obtain the result from (3.27). □ 



Observe that for a particular index i there exists a set of indices (J, k) such i = j — (k — 1)J which 
implies that (Vijjk is matrix representing the subtensor V--^ (3.8). 



A simple consequence of the previous theorem is the following. 
Corollary 3.15 IfB, C and B, C are matrices that span the same Khatri-Rao range i.e. 

KR(B,C) =KR(B,C) 

then 

3red(B, C) = 3red(B, C). 

Remark 3.16 If this corollary is applied to the case whenZ = we obtain - as a special case - a uniqueness 
condition. The CP decomposition (A, B, C) is called unique up to permutation and scaling if any alterna- 
tive decomposition (A,B,C) satisfies A = AIIAi, B = BIIA2 and C = CIIA3 where II is an R x R 
permutation matrix and Aj are nonsingular matrices such that Yij=i Ai = ^-R - Certainly, if B = BnA 2 
and C = CIIA3, then KR(B, C) = KR(B, C) and thus, ^ red (B, C) = 3red(B, C). From Corollary 



3.15 



we 

find that if a CP decomposition is unique up to scaling and permutation then KR(B, C) = KR(B, C) can 
only hold when B and C is a scaled and permuted version o/B and C. 

Remark 3.17 The reduced functional and its analysis is equally well doable for higher order tensors as 
well, e.g., in a forth order decomposition 

R 

{^)ijki — ^ a{ r bj r Ck7-di r . 
r=l 

The Khatri-Rao range KR(B, C) has to be replaced by the analogous set 

KR(B,C,D) = j A" = ^/iibiociodi G R JxKxL \ where m G K, | 

In our view, Corollary |3 . 1 5| displays one possible reason for the swamping effect. We explain this in the 
following subsection. 

3.3.1 One explanation of swamping 

The swamping phenomenon describes the effect that iterations method for minimizing the functional 
3(A, B, C) exhibit a long interval of iterations where the functional value remains almost constant and 
does not decrease. This is commonly seen in the ALS implementation. 



From the definition of Zred in (3.6) 



a(A fe +\B fe ,C fc ) =3 red (B k ,C k ). 
Moreover, for an iteration of the alternating minimization (ALS) procedure, we obtain 

3{A k ,B k ,C k ) >3(A k+ \B k ,C k ) =3 red (B k ,C k ) > 3(A fc+1 , B k+1 , C k ) 

> 3(A fc+1 ,B fc+1 ,C fc+1 ) 

> 3, ed (B fe+1 ,C fc+1 ). 
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Thus, the functional values 3(A fc ,B fc , C k ) will behave in a similar way as 3reci(B' c , C fc ). 

Corollary 3.15 can serve as one possible explanation of the swamping effect. It shows, that the dependence 
of the least squares functional on the matrices B, C is rather low, as it only depends on the Khatri-Rao range 
KR(B,C). In particular, if KR(B fe ,C fc ) = KR(B fc+J , C fc+J ) for some j iterations, then Zred{^ k , C k ) = 
3 r . e d(B fc+J ' -1 , C k+ i~ 1 ) and as a consequence 3(A fe+J , B fe+J , C fc+J ) will stay at the same value for these 
iterations. Moreover, the set of matrices that span the same linear space can be quite large which explains 
the large region at which least squares functional attains the same value. This also explains the increasing 
length of the swamps present in high-order n > 4 tensors; e.g., the subspace KR(B fc , C k , D fc ) corresponding 
to the functional -3(A fe , B fc , C fc , D fc ) = 3red(B fe , C k , D fe ) of a fourth-order tensor is spanned by a huge set 
of matrices of B, C and D. This reasoning can be underpinned by numerical calculations. 

In Figure [T]d, we measure the distance between subspaces (B fc C k ) and (B fc+1 C fe+1 ) (top-left) by 
taking an arbitrary vector x and calculating the norm difference of the projections of x onto the spaces 
(B k C k ) and (B fc+1 C fe+1 ). As seen in Figure [IJd, at the swamp regime, the norm differences in the 
subspaces dip down to 10~ 6 in the ALS implementation which coincides with our swamp explaination that 
KR(B fc ,C fe ) ps KR(B k+j , C fe+J ). The plots in Figure [l]} on the right column describe the measure of the 
subspaces spanned by k-th approximation A fc and the original factor A orig (top-right). Th e no rm distances 



are all fairly small, but relative to the yellow curves produced by using CALS (see Section 5.1), B fe and C fc 
subspaces from ALS are far off from the original subspaces B or .; g and C rig which, once again, indicates 
that KR(B fc , C fc ) KR(B fe+J , C fc+J ) for some j accounting for the ALS swamp. 

Another way to measure the distance between subspaces is through the condition number of the matrix 
[(B fe C k ) (B fc+1 C fc+1 )] as a way to measure linear independence. The bottom plot in Figure [l^, shows 
that when ALS is used, large condition numbers are present at the swamp regime, shooting up to 10 9 . 




2500 0000 






(a) The plot on top depicts an ALS (b) Measurement of Subspaces. Top Left: (B* C fc ) vs 
swamp while the bottom plot tracks (B fe+1 C fc+1 ), Middle Left: (A k C k ) vs (A fc+1 
the condition number of the matrix C fc+1 ), Bottom Left: (B fe A k ) vs (B fc+1 A fc+1 ). 
(B fe C fc B fe+1 C fe+1 ) Top Right: A fe vs A orig , Middle Right: B fc vs B ori9 , 

Bottom Right: C fe vs C or i g . 



Figure 1: ALS (red -+-) and CALS (yellow -x-) 
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4 Bounds on Zred and suboptimal solutions 



In this section we prove lower and upper bound on Zred using Theorem 3.14 Moreover, we will also define 
a dominating functional £, which minimizers can be calculated by standard linear algebra methods. The 
corresponding algorithm, called the Centroid Projection yields an initial guess, for a minimization step for 
Z- 

From Theorem |3.14| we can use the Eckard- Young theorem to obtain lower bounds: We keep the notation 
of Theorem 13.141 and Lemma 13.131 

Corollary 4.1 For all matrices (B,C), the lower bound of Zred is calculated as 

JK /min( J.K) \ 

inf WB,C)>-5> K) 2 (4- 1 ) 

( ' ' 8=1 \ k=R+l J 

if o~l the k-th singular values o/Vj. 

Proof. Recall that the Eckart- Young Theorem gives the infimum through the truncated SVD; i.e. 

(min{J,if} 
E K) 2 
— 
k— li I I 

where a z k 's are the singular value of V,. Also, observe that KR(B, C) contains matrices with rank at most 
R, hence, 

1 JK 

inf Zred(B,C) = inf - VA, inf (||V 4 -X|| F ) 2 

(B,C) (B,C) 2 ^— ' XeKR(B.C) V ' 

z— 1 




□ 



This corollary can be used to find lower bounds on the distance of a tensor to its best rank R approximation. 
In particular, if a tensor has rank R it must hold that 

JK / min( J,K) \ 

em e k) 2 =°- 

i=l y k=R+l J 

Note that the lower bound can a-priori be calculated by standard linear algebra method (eigenvalue and 
SVD decomposition). The computation requires an eigenvalue decomposition of M followed by a SVD of 
each of the matricized eigenvalues Vj. 

The next result establishes an upper bound using a dominating functional. 

Corollary 4.2 For all matrices (B,C), the upper bound of^ re d is calculated as 



inf 3red(B,C) < inf £(B,C) 

B,C B.C 



(4.2) 
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with 



Moreover, 



Proof. 



JK 



£(B,C):= inf V A, (|| V 4 - X|||) 

i—1 



(4.3) 



JK 



inf £(B,C) 

B,C 



inf ^AidlVi-XHl) 
rank(x)<_R~[ 



1 JK 

inf 3 red (B,C) = inf - VA, inf (||V 2 -X|| F ) 2 

B,C) (B,C) 2 ^— ' XGKR(B,C) v y 

i—1 

< inf inf - VAiOlVi-XM 

~ (B,C)XGKR(B.C) 2 ^ V " " ; 

i—1 

^ JK 

, inf ^E A *0i v *- x M 2 

rank(X)</?. z - =1 

The last equality follows from the fact that KR(B, C) contains matrices with rank < R. Moreover using 
the SVD, for any matrix S of rank at most R it follows that S G KR(B, C), where B, C are formed by the 
columns of orthogonal matrices in the SVD-decomposition of S. □ 



In contrast to Zred & minimizer of £ can be found rather easily. First, define 

V c := 



7 < ._ Yh=\ -W, 



(4.4) 



as the centroid matrix of Vj. 

Theorem 4.3 Let y^, be the left and right singular vectors in the SVD of V c defined in (4-4^ Then, 

Be = (yi • ■ ■ y R ) C c = (zi-..zr) 

is a minimizer o/£(B,C). Moreover, 



inf £(B, C) = - 

B,C v ' ' 2 



' JK 



JK 



(1-||V C |||) 



min{ J,K} 

E A + (E A 0( E ^ c ' 2 



k=R+l 



Proof. Expanding the square using ||Vi|j|n = 1 yields 



JK 



' JK 



JK 



JK 



]>>||V S ;-X|| 2 F = 5>UVillF -2(^A,V i( X)+ ^> <X,X) 

i=l \i=l / \i=l / \i=l / 

= (X>) (1-2(V C ,X) + (X,X» 

= (X>1 (1-||V C ||^ + ||V C |||-2(V C ! X) + (X ! X)) 



' JK 



JK 



5> (i-iiv c ii^)+ j> iiv c -xii 



Hence, 



inf£(B,C) = - 



JK 



JK 



E a m ( 1 -iiv c Hf)+ E a < 



\i=l 



inf IIV^-XH 
rank{x}<fl 
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Using again the Eckart-Young Theorem, we see that a minimizer X is found through the truncated SVD of 

R 

fe=i 

Defining B and C as in the theorem we have that X € KR(B, C) and thus (B, C) yields a minimizer of 
£(B,C). □ 



Remark 4.4 Computing minimizers of £ as in Theorem \4-S\ yields matrices B and C which in turn ap- 
proximate the minimizers of Zred- Theorem \3.14\ yields also an simple algorithm using only linear algebra 
to calculate minimizers of £. We refer to this computation of an initial guess as the Centroid Projec- 
tion algorithm. See Figure^p, (Step 1-5) for a detailed explanation of the implementation of the Centroid 
Projection algorithm. 



Combining Corollary |4.1| and Theorem |4.3| yields the following a-posteriori bounds on the quality of the 
output of the Centroid Projection algorithm. 



Corollary 4.5 Let Be and Cc be as in Theorem \4-3\ Then 

\ / JK \ / R \ \ 

„ „ - 1 ' (4.5) 

Proof. Note that the Frobenius norm can be expressed via the sing ular values ||Vi||| = £fe=i(<4) 2 - With 



\3 red (B c ,C c )- mf X«,(B,C)| < - \jr\ (EK) 2 ) - ($>] (E^(V c ) 

( ' ' \i=l \k = l ) \i=l / \k=l 



the normalization condition ||Vi||j. = 1, Corollary 



4.1 



and Theorem 



4.3 



the result follows. □ 



Remark 4.6 The possitivity of the right hand side in this estimate is a consequence of the convexity of the 
sum of the square of the largest singular values (i.e. the Schatten norm). 

The output of the Centroid Projection algorithm can be used as sensible initial guesses for any current 
numerical methods for CP decomposition. Commonly, CP methods are initialized with random guesses 
which at times lead slow convergence rate. In Section|5j we describe how the Centroid Projection algorithm 
is able to mitigate the swamping effect which are often present in the ALS algorithm. 

which first 

5 Numerical computation using the CPCP method 

The Centroid Projection algorithm yields an initial guess which in turn can be combined with any iterative 
method for computing a CP approximation. We will for short refer to any combination of an iterative 
scheme using the Centroid Projection an an initial guess as a CPCP method. 

5.1 CPCP with ALS schemes 

Here we described some CP tensor decomposition numerical techniques based on the least-squares method. 
Matricizing T ~ ^2 r —i a r o b r o c r leads to three equivalent expressions: 

T JKXI k(Bq C)A T T A-/x J w (c Q A)B T and T IJxK ^ (A Q B)C T 

To approximate the factors, three linear least-squares are solved iteratively: 

A k+1 = min3(A,B fe ,C A; ) 

A 

B fc+1 = min3(A fe+1 ,B,C fe ) 
C k+1 = min3(A fc+1 ,B fc+1 ,C fe ) 
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• Alternating Least-Squares (ALS) [2, 7\. 3(A,B k ,C k ) = \\T JKxI - (B k C k )\\ 2 F , 
3(A fe+1 , B, C k ) — \\T KIXJ = (C© A)B T ||2, anda(A fc+1 ! B fe+1 ,C' £ ) = ||T /JX - ft: - (A©B)C T ||2,. 

• Regularized Alternating Least-Squares (RALS) Q2]. 3(A,B fe ,C fe ) = ||T JA ' x/ -(B fc ©C fc )|||,+ 
a k \\A- A fe ||2,,a(A fc +\B,C fc ) =\\T KI * J - (C A)B T ||| + Qfc ||B - B k \\ 2 F and 
3(A k+1 ,B k+1 ,C k )= ||T /JxK -(A0B)C T |||+a fe ||C-C' £ ||| where a k is the regularization parameter. 

• Rotationally Enhanced Alternating Least-Squares (REALS) [20j. The functional J is similar 
to the ALS functional. However, 

A fe+i A k+i + dA k 
B k+i < — B fc+1 + dB k 
C k+1 < — C fe+1 +dC k 



where dA — AR, dB = BR, dC = CR and R is the rotational matrix. 



The upper bound of Zred in Theorem |4.3| provides approximations for the factor matrices, closely esti- 
mating the solution subspaces. We called this method the Centroid Projection algorithm; it is summarized 
in Figure[2]a,. Note that the Centroid Projection algorithm of Theorem 43 is contained in Step 1-5. Steps 6- 



7 in Figure [2^, repeat the algorithm by interchanging the role of A, B, C. We observed a smaller initial 
residual error with this modification in most our numerical examples. The following CPCP methods, use 
initial conditions derived from the Centroid Projection and as an CP approximation the method ALS, RALS 
and REALS. We refer to them as Centroid- ALS (CALS), Centroid-RALS (CRALS) and Centroid-REALS 
(CREALS), respectively. In Figure ^p, we compared all six methods. Recall that a swamp is identified 
in a log error plot with an plateau and an extremely high number of iterations in order to converge. In 
most of our examples, both REALS and CREALS performed the fastest while ALS is the slowest, almost 
always hampered by a swamp. RALS, CRALS and CALS were comparable methods in performance, all 
dramatically decreasing the ALS swamp. 



Input : TeH' 
Output: Atnit, 



, and Ci, 



Step 1. Compute M e M JKXJ - K 

M (fc < Maps-, <= %\ 

Step 2 . Compute EVD of M 
eigenpair of rvl 

obtain V ( £ R J * K < «j £ H JK 

Step 3. Compute the centroid matrix Vp £ R J 

vc ._ sa *.y. 

Step 4. Compute the truncated SVD of Vf 

B-UeE Jxfi 

C = V E S. Ky R 
Step 5. Compute A;„ it = A(B, C) £ K IxR 



\T(B,C) 



with G (BGC ) = (BOC) T (BOC) and T(B, C) = £\ T;(B C) 3 t 
Step 6. Compute B; Tlit = B(A, C) £ K JXR 
repeat Steps 1-5 

redefine M £ R' Kxni from M^s-, = Ej T aJ0 T Sjl 
B« t .— G (Aeo! \T(A,C) 
Step 7. Compute C;„ jt = C(A, B) £ R K * R 
repeat Steps 1-5 



redefine M G K' JXIJ from M c 



J (Aes) 



B)\T(A,B) 




(a) Centroid Projection Algorithm 



(b) Alternating CP Methods: (ALS, RALS, REALS) with Random 
Initial Conditions and (CALS, CRALS, CREALS) with Centroid Ini- 
tial Conditions 



Figure 2: CP Methods with Random and Centroid Initial Conditions. 



The Centroid Projection method helps mitigate the effects of ALS swamps by providing a good set of 
initial factors lying close to the true solution subspaces. 
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5.2 Symmetric CPCP 



-RALS 
REALS 



CRALS 
CREALS 



100 200 300 400 500 BOO 700 GOO 



(a) 



00 



Figure 3: Fully Symmetric CP Decomposition. 



The current methods for CP decomposition do not guarantee factorization with identical factors. In fact, 
when ALS is used in finding tensor decomposition with identical factors, the ALS algorithm will converge to 
a decomposition with no identical factors. Full and partial symmetries in tensor decomposition are referred 
to decomposition with at least two factors being identical. In a recent work of Stegeman }22| , the existence 
and uniqueness of the nth order tensor decompositions with some form of symmetry have been studied for 
n = 3, 4, 5. In other works [TJ [5], symmetries are also described by the permutation of the multi-indices of 
the tensor elements. An example is the following: if tijk = tjki = tkij, then T <E R NxNxN with a tensor 
rank R is fully symmetric and its factors are A = B = C £ M. NxR . Another example is if Ujki = tkiij, 
then T € ^MxNxMxN a tensor rank R is partially symmetric and its factors are A = C £ M. MxR and 
B = De R NxR . 

When the Centroid Projection algorithm is applied to the CP methods (CALS, CRALS, CREALS) for 
symmetric decomposition, the methods with the centroid starters are guaranteed to converge to identical 
factors provided that the tensor dimensions and order satisfy the uniqueness and existence conditions of 
Kruskal [13] and Stegeman [22]. In the case that we have fully (partially) symmetric tensor, then I = J = K 
(I = J or J = K or I = K). Thus, from the EVD of Vj € R lxl , we obtain the minimizers B = C in 
Theorem 03]. 

Figure [3] consists of plots of the number of iterations vs residual errors \\Torig — Test\\ 2 p for symmetric 
tensor decomposition with identical factors A = B = C. For symmetric CP decomposition, CREALS has 
outperformed the other iterative methods in Section 5.1 up to a factor of 10 4 while ALS has been consis- 
tently slow. ALS, RALS and REALS used random initial factors while CALS, CRALS and CREALS used 
calculated factors via the Centroid Projection algorithm. In most cases, these CPCP methods converged 
faster than the random-initialized CP methods. 
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